Text Encoding Variant
A text encoding variant specifies one among possibly several minor variants of a particular base encoding or group of base encodings. Text encoding variants are often used to support special cases such as the following:
For a given text encoding base or small set of related text encoding base values, there may be an enumeration of
- Differences among fonts that are all intended to support the same encoding. For example, different fonts associated with the MacJapanese and MacArabic encodings support slightly different encoding variants. These fonts would typically coexist on the same system without the user being aware of any differences.
- Artificial variants created by excluding some of the characters in an encoding. For example, the MacJapanese encoding includes separately-encoded vertical forms for some characters. In some contexts (such as with QuickDraw GX), it may be desirable to exclude these.
- Different mappings of a particular character or group of characters for different usages.
TextEncodingVariant
values, which always begins with 0, the default variant. In addition, for a possibly larger set of related text encoding base values, there may be bit masks that can be used independently to designate additional artificial variants. For example, there is an enumeration of six variants for the Mac OS Japanese encoding. In addition, there are bit masks that can also be used as part of the variant for any Japanese encoding to exclude 1-byte kana or to control the mapping of the reverse solidus (backslash) character.Languages that are dissimilar but use similar character sets are generally not designated as variants of the same base encoding (for example, MacIcelandic and MacTurkish both use a slight modification of the MacRoman character set, but they are considered separate base encodings).
A text encoding variant is defined by the
TextEncodingVariant
data type.
typedef UInt32 TextEncodingVariant;When you create a new text encoding, you can specify an explicit variant of a base encoding or you can specify the default variant of that base.The function
GetTextEncodingVariant
(page 52) returns the text encoding variant of a text encoding specification.The following enumeration defines constants for the default variant of any base text encoding and for variants of the Mac OS Japanese, Mac OS Arabic, Mac OS Farsi, Mac OS Hebrew, and Unicode base encodings.
enum { /* Default TextEncodingVariant, for any TextEncodingBase */ kTextEncodingDefaultVariant = 0 , /* Variants of kTextEncodingMacJapanese */ kMacJapaneseStandardVariant = 0, kMacJapaneseStdNoVerticalsVariant = 1, kMacJapaneseBasicVariant = 2, kMacJapanesePostScriptScrnVariant = 3, kMacJapanesePostScriptPrintVariant = 4, kMacJapaneseVertAtKuPlusTenVariant = 5, /* Variant options for most Japanese encodings (including MacJapanese, Shift-JIS, EUC-JP, ISO 2022-JP). These can be OR-ed into the variant value in any combination. */ kJapaneseNoOneByteKanaOption = 0x20, kJapaneseUseAsciiBackslashOption = 0x40, /* Variants of kTextEncodingMacArabic */ kMacArabicStandardVariant = 0, /* Cairo font & WorldScript tables */ kMacArabicTrueTypeVariant = 1, /* Baghdad, Geeza, Kufi, Nadeem fonts */ kMacArabicThuluthVariant = 2, /* Thuluth font */ kMacArabicAlBayanVariant = 3, /* Al Bayan font */ /* Variants of kTextEncodingMacFarsi */ kMacFarsiStandardVariant = 0, /* Tehran font & WorldScript tables */ kMacFarsiTrueTypeVariant = 1, /* TrueType fonts */ /* Variants of kTextEncodingMacHebrew */ kMacHebrewStandardVariant = 0, kMacHebrewFigureSpaceVariant = 1, /* Variants of kTextEncodingMacIcelandic */ kMacIcelandicStandardVariant = 0, kMacIcelandicTrueTypeVariant = 1 /* Variants of Unicode & ISO 10646 encodings */ kUnicodeNoSubset = 0, kUnicodeCanonicalDecompVariant = 2 };Constant descriptions
Mac OS Japanese variants
kTextEncodingDefaultVariant
- The standard default variant for any base encoding.
kMacJapaneseStandardVariant
- The standard Japanese variant. Shift-JIS with JIS Roman modifications, extra 1-byte characters, 2-byte Apple extensions, and some vertical presentation forms in the range 0xEB40--0xEDFE ("ku plus 84").
kMacJapaneseStdNoVerticalsVariant
- An artificial variant for callers who don't want to use separately encoded vertical forms (for example, developers using QuickDraw GX).
kMacJapaneseBasicVariant
- An artificial variant without Apple 2-byte extensions.
kMacJapanesePostScriptScrnVariant
- The Japanese variant for the screen bitmap version of the Sai Mincho and Chu Gothic fonts.
kMacJapanesePostScriptPrintVariant
- The Japanese variant for PostScript printing versions of the Sai Mincho and Chu Gothic PostScript fonts. This version includes 2-byte half-width characters in addition to 1-byte half-width characters.
Japanese options
kMacJapaneseVertAtKuPlusTenVariant
- The Japanese variant for the Hon Mincho and Maru Gothic fonts used in the Japanese localized version of System 7.1. It does not include the standard Apple extensions, and encodes vertical forms at a different location.
kJapaneseNoOneByteKanaOption
- This variant indicates that the JIS X0201 Kana values should be excluded from the mapping. These characters are 1-byte characters in JIS X0201 and in Shift-JIS (although not in EUC-JP); they are often referred to as half-width kana (although some Shift-JIS versions, including the Mac OS PostScript variants, have both 1-byte and 2-byte half-width Kana). These characters are often excluded when interchanging Japanese text on the Internet. This variant is not currently supported.
Mac OS Arabic variants
kJapaneseUseAsciiBackslashOption
- This governs the interpretation of the 1-byte code point 0x5C. This is REVERSE SOLIDUS in ASCII, but YEN SIGN in canonical JIS Roman, JIS X0201, Shift-JIS, and so on. However, when these Japanese encodings are used for file names on systems that treat REVERSE SOLIDUS as a path separator, the 1-byte code 0x5C is usually interpreted as REVERSE SOLIDUS and should be mapped as such. This variant is not currently supported.
kMacArabicStandardVariant
- This variant is supported by the Cairo font (the system font for Arabic) and is the encoding supported by the text processing utilities.
kMacArabicTrueTypeVariant
- This variant is used for most of the Arabic TrueType fonts: Baghdad, Geeza, Kufi, Nadeem.
kMacArabicThuluthVariant
- This variant is used for the Arabic PostScript-only fonts: Thuluth and Thuluth bold.
Mac OS Farsi variants
kMacArabicAlBayanVariant
- This variant is used for the Arabic TrueType font Al Bayan.
kMacFarsiStandardVariant
- This variant is supported by the Tehran font (the system font for Farsi) and is the encoding supported by the text processing utilities.
Mac OS Hebrew variants
kMacFarsiTrueTypeVariant
- This variant is used for most of the Farsi TrueType fonts: Ashfahan, Amir, Kamran, Mashad, NadeemFarsi.
kMacHebrewStandardVariant
- The standard Hebrew variant.
kMacHebrewFigureSpaceVariant
The Hebrew variant in which 0xD4 represents figure space, not left single quotation mark as in the standard variant.
Mac OS Icelandic variants
kMacIcelandicStandardVariant
- The Standard Icelandic encoding supported by the bitmap versions of Chicago, Geneva, Monaco, and New York in the Icelandic system. This is also the variant supported by the text processing utilities.
Unicode variants
kMacIcelandicTrueTypeVariant
- The variant used for the bitmap versions of Courier, Helvetica, Palatino, and Times in the Icelandic system, and for the TrueType versions of Chicago, Geneva, Monaco, New York, Courier, Helvetica, Palatino, and Times.
kUnicodeNoSubset
- The standard Unicode encoded character set in which the full set of Unicode characters are supported.
kUnicodeCanonicalDecompVariant
- A variant of Unicode using maximal decomposition with characters in canonical order. This variant does not include most characters which have a canonical decomposition, such as single characters for accented Latin letters or single characters for Korean Hangul syllables (however, this restriction is relaxed for symbol characters in the range U+2000 to U+2FFF). In TEC Manager 1.3, the Unicode Converter supports this variant for converting to and from Mac OS encodings.
© Apple Computer, Inc.
13 NOV 1997